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The 

mas 

Concept 


The University of Houston-Clear Lake established the Research Institute for 
Computing an d Information systems in 1 986 to encourage NASA Joh nson Space 
Center and local industry to actively support research in the computing and 
information sciences. As part of this endeavor, UH-Clear Lake proposed a 
partnership with JSC to jointly define and manage an integrated program of research 
in advanced data processing technology ne eded fo r JSC's main missions, including 
administrative, engineering and science responsibilities^ JSC ag reed and entered into 

jointly plan and execute such research through RICIS. Additionally, under 
Cooperative Agreement NCC 9-16, computing and educational facilities are shared 


bv t he two institutions to conduct the research. 

The mission of RlClS is to conduct, coordinate and disseminate research on 
computing and information systems among researchers, sponsors and users from 
UH-Clear Lake, NASA/JSC, and other research organizations. Within UH-Gear _ 
Lake, the mission is being impleme nted through interdisciplinary inv olvement of fL 


faculty and students from each of the four schools: Business, Education, Human 
Sciences and Humanities, and Natural and Applied Sciences. 

Ot her research organizations are i nvolved via the “g ateway” con cept. UH-Clear 


Lake establishes relationships with other universities and research organizationsr^SSHi 
having common research interests, to provide additional sources of expertise to ' 

conduct needed research. 

A major role of RICIS is to find the best match of sponso rs, researchers and 


research objectives to advance knowledge in the computing and information 
defences. Working jointly with NASA/JSC, RICIS advises on research needs, 
recommends principals for conducting the research, provides technical and 
administrative suppor t to coordinate the research, and integrates _technical results 
into the cooperative goals of UH-Clear Lake arid NASA? JSC, 


Creation of Fully Vectorized 
FORTRAN Code for Integrating the 
Movement of Dust Grains in 
Interplanetary Environments 



Preface 


This research was conducted under the auspices of the Research Institute for 
Computing and Information Systems by Walter Colquitt, of the Houston Area 
Research Center. A. Glen Houston, Director of RICIS, served as technical 
representative for this activity. 

Funding has been provided by the Solar System Exploration Division, Space 
and Life Sciences Directorate, NASA/JSC through Cooperative Agreement NCC 
9-16 between NASA Johnson Space Center and the University of Houston-Clear 
Lake. The NASA technical monitor for this activity was Herbert Zook, of the Space 
Science Branch, Solar System Exploration Division, NASA/JSC. 

The views and conclusions contained in this report are those of the author and 
should not be interpreted as representative of the official policies, either express or 
implied, of NASA or the United States Government. 
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Final Report 


As both the calendar time and the computer allotment are coming to a close 
it is time for the final report of all that has happened. The purpose of this 
contract was to improve the performance of a specific FORTRAN computer code from 
the Planetary Sciences Division of NASA/JSC when used on a modern vectorizing 
supercomputer. The code is used to calculate orbits of dust grains that separate 
from comets and asteroids. This code accounts for influences of the sun and 8 
planets (neglecting Pluto), solar wind, and solar light pressure including 
Poynting-Robertson drag. Calculations allow one to study the motion of these 
particles as they are influenced by the Earth or one of the other planets. Some 
of these particles become trapped just beyond the Earth for long periods of time. 
These integer period resonances vary from 3 orbits of the Earth and 2 orbits of 
the particle to as high as 14 to 13. 

The code is about 700 lines of fairly generic Fortran and it has run on 
a variety of computers - VAX, PC, SX-2, and Cray. Initial runs showed the code 
not well structured for vector computers; this is hardly surprising as this code 
was never run on a vector machine before. Initially there were three problems 
that hampered vectorization: 

o vectors too short with length of 8 (planets) or 3 ( x-y-z 
components of motion ) 

o computed GOTOs in the most inner loop of the Integrator 
o subroutine calls embedded in inner loops 

The first of these degrades performance because of the overhead of vector 
loop start up while the latter two cancel any vectorization attempts by the 
compiler . 

Several runs using the Analyzer were made to isolate the "hot spots" so 
we could concentrate our tuning efforts where they would do the most good. A 
series of modifications were made to improve the performance. At first the 
computed GOTO's were removed by placing the selected code "behind" an IF test 
(see below). This vectorized on the NEC but not on the Cray. We also made an 
attempt to increase vector length by using the three orthogonal elements of the 
motion vectors for each of the 8 planets. This increased vector length to 24 
and improved run time by more than 50%. Unfortunately the data structures 
became too complicated for easy maintenance and modification. This data layout 
was not in conformance with the practice of the art so it was very difficult to 
implement outside suggestions. Finally this attempt was laid by the side. 

The numerical integrator used is the Implicit Runga-Kutta formulation by 
Edgar Everhart. This integrator has achieved great popularity in the study of 
planetary motions because of its predictor-corrector capability which yields 
high performance while still maintaining excellent accuracy and stability. A 
second time step integrator was also investigated. The popular Everhart 
integrator was replaced with a Bulirsch-Stoer integrator (see ref #1) . This 
integrator produced very good answers but its performance deteriorated and became 
unacceptable on highly eccentric orbits. 

By this time many spots of inefficiency in the code had been found and were 
corrected. These were such things as common subexpression identification by 
"enclosing items in parentheses, precalculation prior to a DO- loop, strength 


reduction such as low power exponentiation replaced by repetitive multiplication 
and the like. Several cases of stride problems were corrected by reversing the 
order of subscripts in local work arrays. Also the compiler was given more 
information by changing the dimension statement for dummy subroutine arguments 
from DIMENSION X(l) to X(*) indicating the length needs to be calculated at run 
time - it is NOT of length 1! The cumulative effect of these Improvements 
gradually became quite noticeable. 

Finally a clean version of the Everhart integrator with the planets moved 
in first order eccentric Keplerian orbits was created. This code finally showed 
the kind of performance we were looking for - about 100,000 to 150,000 simulation 
years per hour of chargeable CPU. This version was run for a short simulation 
period of 200 years of planetary motion on the SX-2, a Cray XMP-24, and on the 
SuperTek. Run times were respectively 5. 5 sec, 11.2 sec, and 107 sec. 

A series of final runs were made in a real production mode and one of these 
found resonance trapping by the Earth. 

The following is the major change that was made to Everhart's integrator 
to create the version RA15SX.F0R. 

DO 40 1-1,3 
GOTO (10, 20, 30), I 
10 CONTINUE 

do something when 1-1 
GOTO 40 
20 CONTINUE 

do something when 1—2 
GOTO 40 
30 CONTINUE 

do something when 1-3 
GOTO 40 

40 CONTINUE 


was changed to 

DO 40 I - 1, 3 
IF ( I .EQ. 1 ) THEN 
do something when 1-1 
END IF 

IF ( I .EQ. 2 ) THEN 
do something when 1-2 
END IF 

IF ( I .EQ. 3 ) THEN 
do something when 1-3 
END IF 

40 CONTINUE 

On the SX-2 the latter vectorizes and the former does not - this is because 
of the vector mask test registers in the machine. In the Cray neither fully 
vectorizes but the latter creates a "vector scalor" loop; whatever that is. Even 
so however relative performance was very good considering hardware speeds. 


Enclosed with this report are two 5 1/4" floppy disks. One contains the 
results of the test runs made on the Cray. There are two groups of four files 
- submitted JCL, compilation listings, day file listings, and final output 
answers . 

The second disk is the final delivery product. It's contents are as 
follows : 

NOFF.FOR a utility program to remove tabs, blank lines, and nonprintable 
characters from a file. This is a very handy preprocessor for files to be shipped 
over DECNET. 

RA15.F0R original Everhart integrator with the code cleaned up for 
easier reading. Carefully verified to provide same results as the original. 

RA15SX. FOR Everhart integrator modified for vectorization. Produces the 
same answers as the original . 

DELIVERY. JCL SX-2/0S JCL to run the job. 

E9E_MAIN.F0R final version run of Encke run that produced trapping. 

FOURL.FOR and E9E.MAIN two versions (four planets only and full planet 
stepwise integrated motion. The former had problems with accuracy and the latter 
with performance) 

HALLEY. JCL Final version a Halley dust grain run; no trapping found 

X171361 . SXO A full blown run including JCL, FORTRAN expanded listing, and 
results of this run. Processed by NOFF.FOR. 

FINL_ENC . JCL This is input JCL, Including the final fully cleaned up 

source code with vector version of RA15SX. THIS IS THE FINAL PRODUCT OF THIS 
CONTRACT. 


The Future 

There are at least two possibilities for further investigation. One would 
be more code improvements. This might be to take the integrator inner loop and 
pull it inside of the IF tests replicating the total loop each time. I'm not 
fully convinced this could be made to work, or even that the improvement would 
be worth it. Performance improvement would probably be slight because the 
overhead of the IF test is very low on the NEC and fairly low on the Cray but 
the inner loops behind the IF would be very short so vector startup would be 
expensive and this would especially penalize the Cray. 

Current : 

DO J - 1, N 

DO K - 1, L 

IF ( J.EQ.l ) THEN 

do k code for when J-l 

etc 

would be changed to 

DO J - 1, N 

IF ( J.EQ.l ) THEN 
DO K - 1, L 

each time replicate the entire do k loop as appropriate for J. 


The second possibility would be to use the current code and begin to 
investigate the parameter space of initial dust particle orbital elements in 
order to limit the areas of interest. This would be especially useful for the 
100-200 micron particles which seem more susceptible to trapping but these larger 
particles are harder to calculate as they decay slower. 
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